Automated design of efficient fail-safe fault tolerance
نویسنده
چکیده
Both the scale and the reach of computer systems and embedded devices have been constantly increasing over the last decade. As such computer systems become pervasive, our reliance on such systems increases, resulting in our expectation for such systems to continuously deliver services, even in the presence of faults, that is we expect the computer systems to be dependable. One way to ensure the continuous delivery of dependable services is replication, which however, is expensive, so we focus on the cheaper alternative, that of software-based fault tolerance. There are different levels of fault tolerancethat can be provided, for example masking fault tolerance, fail-safe fault tolerance etc. In this thesis, we focus on providing fail-safe fault tolerance. Intuitively, a fail-safe faulttolerant program is one where it is acceptable for such a program to “halt” when faults occur, as long as it always remains in a “safe” state. Moreover, we endeavor to synthesize efficient fail-safe fault tolerance. We used two commonly-used criteria to assess the efficiency of a fail-safe fault-tolerant program, namely (i) error detection latency – or latency for short –, i.e., how fast can a fail-safe fault-tolerant program detect an erroneous state, and (ii) error detection coverage – or coverage for short, i.e., the ratio of “harmful” errors the program can detect. In this thesis, we present a formal framework for the design of efficient fail-safe fault-tolerant program. The framework is based on a refined theory of detectors, which introduces novel insights into their working principles. We introduce the concept of a perfect detector, which allows a fail-safe faulttolerant program to have perfect detection. This means that a program, composed with perfect detectors, have optimal detection coverage. Optimal in the sense that the detectors detect all of the “harmful” errors, and make no mistakes. Then, we present the concept of fast detection, and show how a failsafe fault-tolerant program can have both perfect, and fast error detection. In fact, the detection latency is shown to be minimal, i.e., the error is detected
منابع مشابه
Designing Efficient Fail-Safe Multitolerant Systems
In this paper, we propose a method for designing efficient fail-safe multitolerant systems. A multitolerant system is one that is able to tolerate multiple types of faults, and a fail-safe multitolerant system handles the various fault types in a fail-safe manner. Efficiency issues of interest are fault tolerance-related, and they are: (i) completeness, and (ii) accuracy. Based on earlier work,...
متن کاملA Fail-Safe CMOS Logic Gate
This paper reports a design technique to make Complex CMOS Gates fail-safe for a class of faults. Two classes of faults are denned. The failsafe design presented has limited fault-tolerance capability. Multiple faults are also covered.
متن کاملA Framework for the Design and Validation of Efficient Fail-Safe Fault-Tolerant Programs
We present a framework that facilitates synthesis and validation of fail-safe fault-tolerant programs. Starting from a fault-intolerant program, with safety specification SS, that satisfies its specification in the absence of faults, we present an approach that automatically transforms it into a fail-safe fault-tolerant program, through the addition of a class of detectors termed as SS-globally...
متن کاملModeling Fault-tolerant Distributed Systems for Discrete Controller Synthesis
Embedded systems require safe design methods based on formal methods, as well as safe execution based on fault-tolerance techniques. We propose a safe design method for safe execution systems: it uses discrete controller synthesis (DCS) to generate a correct reconfiguring system. The properties enforced concern consistent execution, functionality fulfillment (whatever the faults, under some fai...
متن کاملByzantine Fault Tolerant Authentication
A Byzantine fault tolerant public key infrastructure is presented. It aims to fulfill the authentication requirements of large distributed systems consisting of semi-trusted parties. The distributed trust model does not demand the existence of predefined trusted parties and provides authentication if more than a threshold of the participants are honest. A voting based protocol implements distri...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003